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Summary. The Waxman random graph is a generalisation of the simple Erdos-Renyi or 
Gilbert random graph. It is useful for modelling physical networks where the increased cost 
of longer links means they are less likely to be built, and thus less numerous than shorter 
links. The model has been in continuous use for over two decades with many attempts 
to select parameters which match real networks. In most the parameters have been arbi¬ 
trarily selected, but there are a few cases where they have been calculated using a formal 
estimator. However, the performance of the estimator was not evaluated in any of these 
cases. This paper presents both the first evaluation of formal estimators for the parame¬ 
ters of these graphs, and a new Maximum Likelihood Estimator with 0(n) computational 
time complexity that requires only link lengths as input. 

1. Introduction 

The study of random graphs provides insight into the formation of real networks and 
methods to synthesise test networks for use in simulations. Early research was dominated 
by the study of Gilbert-Erdos-Renyi (GER) random graphs (Gilbert, 1959; Erdos and 
Renyi, 1960), which link two vertices independently with a fixed probability, and hence 
select particular graphs (with a fixed numbers of nodes) all with equal probability. Due 
partly to its mathematical tractability this model was studied for many years despite 
clear limitations in its applicability to real networks. 

Random graphs explicitly incorporating geometry became popular with the intro¬ 
duction of the Waxman random graph (Waxman, 1988), proposed as an alternative to 
the GER random graph as a more realistic setting for testing networking algorithms. 
The Waxman graph has subsequently been used in applications as wide-ranging as com¬ 
puter networks, transportation and biology (Waxman’s original paper is listed by Google 
Scholar as having been cited around 3000 times). Moreover, the model has been rein¬ 
vented at least once so not all use cases can be traced back to this paper. 

The GER random graph links every pair of vertices independently with a fixed prob¬ 
ability, whereas the Waxman graph reflects that in real networks longer links are often 
more costly or difficult to construct, and their existence therefore less likely. R links 
nodes i and j with a probability given by a function of the distance dij between them. 
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The form chosen by Waxman was the negative exponential*, i.e., 

= ( 1 ) 

for parameters g, s > 0. A Waxman random graph is generated by randomly choosing a 
set of points in a section of the plane (usually the unit square), and then linking these 
points independently according to the distance between them and (1). 

Despite frequent use of the Waxman graph there is no formal literature on how to 
estimate the parameters (g, s) from a given graph, or set of graphs. In most works 
the parameter values have been chosen arbitrarily with the authors giving little or no 
justification for the values used. In many cases authors use the parameter values given 
in earlier works without regard for their origin or applicability. A few works use more 
careful estimates, but do not evaluate the estimator performance. They simply apply 
estimates and report results. 

This paper presents the first work of which we are aware on the formal estimation 
of the parameters of the Waxman graph. As part of our investigation we explain many 
properties of the Waxman graph and its likelihood function, and derive: 

• its average node degree, expected number of edges, and average edge length; 

• asymptotic expressions for these for large s; 

• the Kullback-Leibler divergence between Waxman graphs and the GER graph; 

• minimal sufficient statistics for the model and hence a MLE for the model param¬ 
eters. 

We present the Maximum Likelihood Estimator (MLE) under the assumption of in¬ 
dependence between links, and demonstrate that its performance is close to the Cramer- 
Rao lower bound. We also compare the MLE to other existing approaches and show its 
advantages in: 

• accuracy; 

• computational time complexity (by suitable sampling of the existing link lengths 
0(n) can always be achieved); 

• minimal input (using only the lengths of links which exist or a sufficiently large 
sample of such links). 

Einally we use the MLE to estimate the distance dependence of three real networks, 
using one biological and two Internet datasets. 

2. Background and Related Work 

A graph (or network) consists of a set of n vertices (synonymously referred to as nodes) 
which without loss of generality we label V = {1, 2,..., n}, and a set of edges (or links) 

*Note that our parameterisation differs from that in much of the literature, for reasons to be 
explained in the following sections. 
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£ C V X V. We denote the number of edges hy e = \£\. Here we are primarily concerned 
with undirected graphs, though much work on random graphs is easily generalised to 
directed graphs. 

Two nodes i and j are adjacent or neighbours if {i,j) € £■ They are connected if a 
path i = ni—n 2 — ■ ■ •—= j exists, where (ni,n;+i) G T, and the graph is connected if 
all pairs of nodes (i, j) are connected. 

The GER random graph Gn,p of n vertices is constructed by assigning independently 
and with a fixed probability each tuple of nodes {i,j) to be in £. The Waxman random 
graph generalises this by making the probability of each tuple of nodes an edge dependent 
on a function p of the distance between the two nodesf. 

While the Waxman graph was originally dehned using Euclidean distances on a rect¬ 
angle or straight line with points on an integer grid, and most subsequent work has 
considered graphs dehned with points randomly placed in the unit square, there is no 
impediment to choosing points in an arbitrary convex region^ with an arbitrary distance 
metric. Hence, we dehne a Waxman graph by placing n nodes uniformly at random 
within some dehned region R of a metric space H with a distance metric d{x, y) and 
each pair of nodes is made adjacent independently with probability given by (1). 

This differs from Waxman’s original parameterisation § (Waxman, 1988) 

= ( 2 ) 

for parameters a, (3 > 0, where L is the greatest distance possible between any two points 
in the region of interest. The advantage of Waxman’s parameterisation is that a is a 
dimensionless constant. Unfortunately, previous authors have confused this notation by 
reversing the roles of a and f3 with almost equal regularity, to the point where parameters 
chosen in one paper have been reversed in another purporting to compare results. Hence, 
we provide an alternate parameterisation (1) chosen with the estimation problem in 
mind. We deliberately avoid a dimensionless parameter because real estimates naturally 
have units and it has been common in the literature to assume incorrectly that use of 
such a parameter removes dependence on the region shape. 

The parameter s determines how dependent links are on distance. Larger s decreases 
the likelihood of long links. 

The use of f3 in past works on the Waxman graphs has allowed for /? > 1, at which 
point the function (2) must be truncated, creating a corner in the deterrence function 
shape. This is undesirable from the point of view of parametric estimation, so we restrict 

fSome examples of use of the Waxman graph include (Waxman, 1988; Thomas and Zegura, 
1994; Zegura et ah, 1996, 1997; Doar and Leslie, 1993; Wei and Estrin, 1994; Salama et ah, 1997; 
Verma et ah, 1998; Matta and Guo, 1999; Shaikh et ah, 1999; Fortz and Thorup, 2004; Neve 
and Mieghem, 2000; Rastogi et ah, 2001; Wu et ah, 2000; Guo and Matta, 2003; Kuipers, 2004; 
Kaiser and Hilgetag, 2004b, a; Gunduz et ah, 2004; Garzaniga et ah, 2004; Lua et ah, 2005; Holzer 
et ah, 2005; Wang et ah, 2005; Ahuja et ah, 2009; Huang et ah, 2007; Malladi et ah, 2007; Tran 
and Pham, 2009; Fang et ah, 2011a,b; Gosta et ah, 2010; Janssen, 2010; Davis et ah, 2014). 

IGonvexity is not strictly required, except where there is the notion that the links are physical, 
and must themselves lie in the region of interest. 

§Although users of the Waxman model commonly state that a G (0,1] there is no reason it 
should not take arbitrary positive values. 
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q G (0,1]. In this case q is a thinning of the edges, and we shall see that this allows 
decoupling of estimation of the two parameters. 

There are many variants of the Waxman graph through modifications of the distance 
deterrence function, for instance Zegura et al. (1997) proposed p{d\ j3) = (3 exp(—d /{L — 
d)). One of the more interesting variants, which we refer to here as the DASTB (Davis- 
Abbasi-Shah-Telfer-Begon) graph (Davis et ah, 2014), takes Waxman-like rates for con¬ 
tacts between the agents represented by vertices in the graph, and then considers an 
edge to exist if at least one contact is made. The number of contacts is assumed to have 
Poisson distribution with parameter kij given by the Waxman function (1), and then 
the probability of at least one contact is Pij{d) = 1 — exp(—/cjj). We do not formally 
consider this model here (though we do adapt their estimation approach) except to note 
that for sparse networks, the low probability of contact means that the two ensembles 
of random graphs are similar. 

Properties such as connectivity and path lengths in Waxman graphs have been stud¬ 
ied in several works (Thomas and Zegura, 1994; Zegura et ah, 1996, 1997; Van Mieghem, 
2001; M.Naldi, 2005). However, few have considered the estimation of parameters. In¬ 
deed, some of the earlier works compared graphs merely by visual inspection. 

The first work we are aware of which attempted to estimate the parameters of a 
Waxman graph from real data is that of Lakhina et al. (2002). Using a large set of 
real Internet and geographic data the authors found that an exponential distance-based 
probability was reasonable. The authors conducted the study with some care, comparing 
two sets of data and finding consistent results by using a log-linear regression of the link 
distance function against the link distances. However they made no effort to consider 
the efficiency or accuracy of their estimator. 

Estimation has also been used on the DASTB model (Davis et ah, 2014) by deter¬ 
mining the parameters of a binary Generalised Linear Model (GLM). We can adapt this 
approach to the almost equivalent Waxman graphs by considering that links are the bi¬ 
nary outcomes of a treatment which is functionally dependent on link distances through 
the distance deterrence function, then use a GLM to estimate the parameters. 

Davis et al. (2014) showed, using their parameter estimations and Akaike’s Informa¬ 
tion Griterion corrected for small sample sizes (AIGc), that the DASTB graph was a 
better model for their data than the GER (and provided further comparisons against 
more complex models), and showed a correlation between s and the population density. 
This has implications for pathogen persistence in wildlife because density has often been 
considered a critical parameter of disease transmission. However, if contact over longer 
distances increases in lower-density populations then pathogen persistence may not be 
so sensitive to host density, as dense populations do not mix so freely. 

The underlying assumption of Lakhina et al. (2002) and Davis et al. (2014) is that 
all of the distances are known, even for links that do not exist. This is possible if all 
node locations are known, but in practice it is sometimes easier to measure the length 
of a link that exists - say by probing it - than to estimate the length of a hypothetical 
link. Consider also graphs for which the “distance” is not a physical quantity, but rather 
a cost or an administratively configured link-weight, for instance in computer network 
routing protocols such as OSPE link weights can be the output of an optimisation (Fortz 
and Thorup, 2003). In these cases link “distance” does not even exist for non-links. 
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We shall also show that techniques which depend on existence tests of all edges have 
time complexity 0{v?), where n is the number of nodes in the graph, whereas our 
estimation method has time complexity 0(e), where e is the number of edges in the 
graph. This can results in a dramatic improvement in computation time as large real 
graphs are often very sparse. 

Waxman graphs are an example of the general class of SERNs (Spatially Embedded 
Random Graphs) (Kosmidis et ah, 2008). One aim of this work is to develop intuition 
which can be extended to estimation of the parameters of random graphs from the 
general class. We will leave consideration of the general case to later work because 
complex questions of existence and identifiability arise. 


3. General properties of Waxman graphs 

The starting point for the creation of a Waxman random graph is to generate a set of 
n points uniformly at random in some region i? of a metric space El. For any given 
region we can derive a probability density function g(t) for the distance between an 
arbitrary pair of random points. This is commonly called the Line-Picking-Problem: 
for instances of regions including lines, balls, spheres, cubes, and rectangles see Ghosh 
(1951); Rosenberg (2004) and Weisstein (c,b,e,d,a). 

Given the distribution of distances between points, we can calculate the probability 
that an arbitrary link exists (prior to knowing the distances): 

1*00 

P{(i, j) G T I g,s} = g / exp{-st)g{t)dt = qG{s), (3) 

Jo 


for any i ^ j, where G{s) is the Laplace transform of g{t) (or equivalently it is the 
moment generating function w.r.t. to —s). We know that the Laplace transform at 
s = 0 of a probability density is the normalisation constraint, so G(0) = 1. Hence when 
s = 0 there is no distance dependence and the result is the GER random graph. 

From this probability we can also compute features of the graph such as the average 
node degree 

k= {n-l)qG{s), (4) 

from which we can derive values of q that produce given average degree for a given 
network size and s. From the handshake theorem we can derive the average number of 
edges to be 

e = n{n — l)qG{s)/2. (5) 

We can then derive the distribution of the length d of a link in the Waxman graph, 
and we denote this by f{d \q, s) = P{dij = d \ {i,j) E £} which is 


f{d I q,s) 


P{(i, j) G £ I d(ij) = d-,q,s}P{dij = d} 
P{(i,j) g£ \ q,s} 
g{d) exp(—sd) 

6 (^^ ■ 


( 6 ) 




6 Roughan eia\. 

Note the g is a thinning parameter, and thus should not change the link length distri¬ 
bution, and we can see that it vanishes from the distribution. Hence we generally write 
/ omitting q, i.e., f{d \ s). 

The formula allows easy calculation of the mean length of links in the Waxman graph 


1 r"isi 

Hd I s] = / tg{t) exp(-st) dt = 

G[s) Jo G{s) 


(7) 


For small distances t, region-boundary effects are minimal, and so the function g{t) 
depends only on the dimension of the embedding space. For example the square and 

disk both have g{t) 27rt, which comes from the size of the ring of radius t. Given a 
Euclidean distance metric on a /c-dimensional space the small t approximation is 


9{t) ~ 


r{k /2 + l) 


t 


k-l 


( 8 ) 


which is the surface area of the {k — l)-sphere (the surface of the hyper-sphere embedded 
in the /c-dimensional Euclidean space). 

Using the following theorem, we can derive large s approximations for the Laplace 
transforms of a Cumulative Density Eunction (CDE). We work here mainly with the 
probability density function g{t), and this must be integrated to get the CDE, but 
otherwise the applicability should be clear. 


Theorem 1 (Teller (1971), pp. 445-6). If 0 < a < oo, and we have a positive 
measure concentrated on (0, oo) defined by the CDF H{t) with Laplace-Stieltjes transform 
II{s), then 

^ if(»)'-“ (9) 

where L{t) is a slowly varying function at 0, where this is defined as a function where 
L{xt)/L{t) = 1 for all x > 0. 


Applying 1 for a /c-dimensional Euclidean space we get 


G(s) 

(5'(s) 

E[d I s] 


™ 7T^/^T{k + l) 

T{k/2 + l) ^ ’ 

s-^oo -n7r^/^F(A: -k 1) ^-k-i 

r{k/2 + l) ^ 


( 10 ) 

( 11 ) 

( 12 ) 


The Waxman graph is usually defined in terms of a number of nodes and two pa¬ 
rameters {q,s). However, the critical feature relating these two is the node density on 
the space p. This parameter determines the number of nodes within a certain distance 
of a given node. Davis et al. (2014) showed the crucial importance of the relationship 
between density and the spatial deterrence function. We can see from the small t, large 
s approximations that this relationship is (absent of boundary effects) a simple inverse 
relationship that generalises to arbitrary dimension. Intuitively it arises because for 
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large s only short links are likely, and hence the boundaries of the region play little part 
in the distribution. 

As a measure of the difference between the Waxman and GER link-distance distri¬ 
bution we use the Kullback-Leibler Divergence (KLD)^, which provides an Information 
Theoretic measure of the difference. 

The KLD of distribution f{t\s) from g{t) is defined by 

KLD{g{t)\\f{t\s)) = g{t)\nj^dt. (13) 

From (6) we can derive 

poo 

KLD{g{t)\\f{t\s)) = / g{t)[lng{t)-lnf{t\s)]dt 

Jo 

g{t) [ lng{t) — lng{t) -|- sf -|- In (5(s)] dt 

poo poo 

= s tg{t) dt + lnG{s) / g{t)dt 
Jo Jo 

= sE[d]+ lnG(s). (14) 



Taking the small t, large s approximations we can see that 


KLD{g{t)\\f{t\s)) 


S^OO 


S^OO 

r\j 


k 

/c - Invr -|- lnr(fc + 1) — lnr(/c/2 -|- 1) — /sins 

c{k) — /sins. 


That is, the deviation for large s of the length distribution of the Waxman graph from 
the GER graph is a constant dependent on the dimension of the space minus a term 
proportional to the log of s. 

The likelihood function for a particular Waxman graph under the usual independence 
assumption, given the lengths of the observed links d, is 


£(s I d) = f{dij I s). 


(15) 


We apply (15) below, even though links in the Waxman graph are only conditionally 
and not truly independent given the distances. This can be intuited by considering 
triangles (edges connecting sets of three nodes). The fact that the underlying distances 
are a metric forces the three distances to satisfy the triangle inequality and thus the 
three must be correlated. Fortunately the correlation is largely local. For instance, if we 
consider two distinct node pairs (R, ji) and {i 2 ,j 2 ), then the existence of edges 
and (* 2 )^ 2 ) is independent (given no extra information). Thus correlations are largely 
mediated through common nodes. As a result we should expect weaker correlations as 
the number of nodes n grows (given a fixed number of edges) and similarly we expect 
the correlations to increase as the average node degree increases. We demonstrate how 
this affects various estimators below. 

^The KLD is not strictly a distance metric because it lacks symmetry, hence the term “diver¬ 
gence” . 
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4. Estimation Techniques 

This section describes various approaches to the parameter estimation of Waxman ran¬ 
dom graphs. 


4.1. Log-linear regression 

The first estimation method applied to this problem was introduced by Lakhina et al. 
(2002). They noted that 

^ cexp(-sd), (16) 

9{d) 

where we can see from our calculations that c = IjG. If we could form this ratio, we 
might estimate s by log-linear regression against d. The observed lengths of links in the 
graph yield implicit measurement of the numerator in (16) and we could obtain estimates 
of g{d) either: 

(a) analytically: use the shape of the region to compute g{d); or 

(b) empirically: use the distances between all of the nodes (not just the linked nodes) 
to estimate g{d). 


The former has the advantage of being fast. The latter exploits the data itself in the 
case that the region is not regular, but computationally it is O(n^). The latter approach 
was used by Lakhina et al. (2002), but we shall evaluate both. 

We also need to estimate f{d \ s): Lakhina et al. (2002) did so by forming a histogram. 
We shall use this approach, so the estimator proceeds by counting the number of edges 
in each length bin, and dividing by the expected number in that bin absent the distance 
deterrence function. 

Once we have computed s, we estimate q by inverting (5) to get 


2e 

n{n — l)G(s) ’ 


(17) 


where n and e are the observed number of nodes and edges respectively. The decoupling 
of estimation of s and q makes this sequential estimation possible, and this will be 
exploited in other estimators described below. 

The time complexity of the above algorithm based on the analytical formulation of 
g{d) is 0(e) but if we were to estimate g{d) using the distances between all nodes then 
it would be O(n^). Our evaluation focuses on the time to perform the regression, i.e., 
we do not include the time to construct the distances and empirical histograms as these 
are problem dependent. For instance. Euclidean distance calculations are fast, but in 
one case we compute distances on the globe (including corrections for its non-spherical 
nature) and these calculations can be considerably slower. 


4. 2. Generalised Linear Model (GLM) 

Davis et al. (2014) use a GLM to estimate the parameters of their model, ours is slightly 
different. In order to make the difference in usage explicit we include a description of 
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the framework taken from (Casella and Berger, 2002, p. 591). A GLM consists of three 
components: 

A random component: The response variables Yi,Y 2 ,... ,Yn. These are assumed to 
be independent (but not identically distributed) random variables from a specified 
exponential family. 

A systematic component: a linear function of a vector of predictor variables 

k 

/3o+(18) 
m=l 

where is the mth predictor variable. 

A link fnnction: r(p) such that 


k 

r(ix0=/3o+(19) 

m=l 


where pi = lE[y)], and Xi is the predictor vector for the ith response. 

Estimation of the Pm is usually by maximum likelihood, i.e., we write the log likeli¬ 
hood, differentiate with respect to Pm and set to zero, giving k + 1 non-linear equations 
which are solved numerically. Here we use Matlab’s glmfit to perform the task. 

We index the variables by their edge label {i,j) and the response variables are 
indicators of edges so Y is the adjacency matrix of the graph, i.e., 


f 1, if (bj) e 

1 0, otherwise. 


( 20 ) 


These are modelled as Bernoulli random variables, and hence ppj) = p{dij), i.e., the 
probability that edge exists given that the distance between nodes i and j is dij. The 
predictors are the distances, i.e., = dij and k = 1. 

The probabilities are assumed to follow (1), so the natural link function is log. In 
this case we get 

^og{p(ij)) = Po + Pidij, (21) 

and make the natural identification that Po = log(g) and Pi = —s. 

The GLM of Davis et al. (2014) is almost identical but for the use of the link function 
r{P{i,j)) = log(-log(l -p(ij))). 

When the response random variables are Bernoulli, it is more conventional to use 
a logit link function. The advantage of this is that it maps the entire range of linear 
combinations of the systematic component into the interval (0,1), and thus produces 
legitimate probabilities. 

Using log as the link function for modelling Waxman graphs means there is an implicit 
bound on the parameters Pm such that Pq + Pidij must be positive. As long as the 
estimator returns positive parameters, this is assured, but in cases such as small s it is 
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possible to have examples where natural estimates lie on the boundary. Similar issues 
arise for our estimator, so we defer further discussion until section 4.5. 

One potential problem with fitting a GLM is that if there is a cutoff c such that for 
all dij < c the edge exists and for all dij > c the edge is absent, then the GLM may 
not find finite coefficients that best fit the data and estimates will have large errors. 
This is unlikely to happen when estimating the parameters of large networks since the 
probability as a function of d decays exponentially, so there is no clean cutoff. However, 
in small networks there may only be a few edges, and it is possible that the data exhibit 
such a cutoff by chance. In cases with n ~ 10 we observe very poor performance for 
GLM fits as a result of this phenomena, but it has negligible impact on the accuracy of 
GLM fits for larger networks. 

The GLM method uses all possible edges: those that exists and those that do not. 
Hence the number of response variables in the problem is 0{v?), and so the computa¬ 
tional time complexity is also O(n^). 

4.3. Sufficient Statistics 

An obvious question arises as to what is a set of minimal sufficient statistics for use 
in the estimation of Waxman random graph parameters. To answer this we apply the 
following theorem: 

Theorem 2 (Casella and Berger (2002), Theorem 6.2.13, p.281). 

Let /(x I 9) be the PMF or PDF of a sample X. Suppose there exists a function T(x) 
such that, for every two sample points x and y, the ratio /(x | 9)/ f(y \ 9) is a constant 
function of 9 if and only ifT{x.) = T{y). Then T(X.) is a minimal sufficient statistic 
for 9. 

Take the sample to be the set of edge lengths d = (di,... , de) where dk = d(^ij'j for 
{i,j) G £■ Under the independence assumption the PDF of a sample is Y\i=i f {di\s, q), 
conditional on the number of edges e, where / is defined in ( 6 ), and e is binomially 
distributed (e) where N = n(n — l)/2 and p = qG{s). 

Gonsider two samples: x = (xi,..., XeJ and y = (yi,..., yes); then the ratio in 2 is 

Bp {ei)UtUf{xi\s,Q) ^ B^ (ei) Y\T=i gjxQe-^^'/G{s) 
B^ie2)n?=if{yi\s,q) ~ B^{e2)Ul=i9{yi)e-^y'/G{s) 

Gis^B^je,) 

nT=i9iyi) nT=ie-^y' G{sY-B^{e 2 ) 

ic ) 

= m(x, ( 22 ) 

( 62 ) 

where d^ and dy are the averages of the distances in datasets x and y respectively, and 
m(x, y) is a function only of y(-) of x and y, independent of the parameters {q, s). 

The ratio above depends on the parameters s and q only through the statistics d 
and e. Hence if ei = 62 and dx = dy, the ratio is a constant function of {q, s). On the 
other hand, if either ei Y 62 or dx Y dy, then the ratio is a non-constant function of the 
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parameters. Hence the conditions of the theorem are satisfied and (e, d) forms a minimal 
sufficient set of statistics. Notably, these statistics can be constructed almost trivially 
in 0(e) time, and we will use that fact to construct a MLE. 


4.4. Maximum Likelihood Estimator 

We consider the log-likelihood function £(s | d) = ln£(s | d) and note from (15) that 

I d) = ^ ln/(d(ij) I s) 

= - InG(s) 

bj)e£’ 

= -eln(5(s)-F Ing{d(^ij)) - s Y %i)- 

(*d)6£ 

Our goal is to find s such that the partial derivative of i{s \ d) w.r.t. s is zero, i.e., 

— 1^(5 I d) = -e ^ - Yj ^ 

so we need to find s such that 


G(s) 


1 

e 


{i,j)££ 


-d, 


(24) 


where d is the observed mean link length. 

From (7) we know —G'jG is the expected length of line segments on the Waxman 
graph, so the MLE of s is also the moment-matching estimator. 


4.5. Existence and uniqueness of the MLE 

When will a unique solution to (24) exist? We know from the definitions of G{s) and its 
derivative that G(s) > 0, and h{s) = —G'{s)/G{s) is a continuous, positive function for 
all s, so if h{-) is monotonic there will be at most one solution to (24) for any given d. 
Consider the derivative 


dh _ (5'(s)2 - G”(s)G{s) 

~ds ~ G(s)2 ■ 

Now the numerator is positive, and from Schwarz’s Inequality 

2 


( 25 ) 


G'"(s)G(s) = J ^/t‘^g(t)e dt ■ j \/~g{i) 


o — St 


dt 


> 


tg{t)e dt 


= 
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and so the denominator is negative, and hence the function h{s) is monotonically de¬ 
creasing with s. 

When s = 0, the Laplace transform (5(0) = 1 and hence /i(0) = f tg{t)dt = g, the 
average distance between pairs of nodes. When we remove longer links preferentially, the 
average link distance must decrease, so it is intuitive that h{s) is a decreasing function, 
starting at /i(0) = g = dmax, which is the maximum expected edge distance over all 
possible parameters s. 

In the limit as s —?• oo we already know from (12) that h{s) = E[(i] ~ k/s, and so we 
know that in the limit as s —)■ oo that h{s) —)■ 0, so for any measured d £ (0, dmax] there 
will be a unique solution to (24). 

Unfortunately, it is possible for the sample mean of the edge distances d > dmax, 
i.e., for a particular graph to have an unlikely preponderance of longer links. In this 
case (24) has no solution for s £ [0,oo), which seems to create problems. However, the 
obvious interpretation of d > dmax is that then there is no evidence that long links have 
been preferentially filtered from the graph, and so it is natural in this case to assume 
the model should be the GER random graph, i.e., s = 0. 

In formal terms, the MLE satisfies some but not all of the properties required for 
it to be consistent. Standard asymptotic theory for MLEs requires the condition that 
the true parameter value lies away from the boundary to form consistent estimates that 
converge to the true value as the amount of data increases. 

For more formal consideration of this boundary case, we could introduce a strict 
hypothesis test for GER vs Waxman graphs, but for the moment we take s = 0 in 
these cases. We will see that in these cases that this leads to a slight bias in estimates 
for graphs with small s. The bias leads to a variance-bias tradeoff, leading to a RMS 
error in the estimator that can be lower than the Cramer-Rao (CR) bound (for unbiased 
estimators). 

In the case where the parameter s does lie away from the boundary, ideal MLEs have 
a number of useful properties: 

• Consistency: a sequence of MLEs converges in probability to the value being esti¬ 
mated as the amount of data increase: in particular the estimates are asymptoti¬ 
cally unbiased. 

• Efficiency: i.e., it achieves the GR lower bound when the sample size tends to oo, 
i.e., no consistent estimator has lower asymptotic mean squared error. 

Another way to state these results is that MLEs are asymptotically normal, i.e., as the 
sample size increases, the distribution of the MLE tends to the Gaussian distribution 
with mean given by the true parameter, and covariance matrix equal to the inverse of 
the Fisher information matrix. 

However, we know that our MLE does have a potential problem at the boundary, 
and that the independence assumption is an approximation, and so it is interesting to 
consider whether the ideal bound is actually achieved. In the next section we derive the 
CR lower bound against which we make comparisons. 
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4.6. Bounds 

The CR lower bound is the minimum variance of any unbiased estimator, under certain 
mild conditions. In the case of a scalar parameter such as s, the bound is given by the 
inverse of the Fisher information I{s) defined by 


I{s) = -E 


s)\ 

) 


(26) 


Here we have already computed the first derivative of the likelihood in (23), 
second derivative is 

d^£ _ dh 
ds‘^ ^ ds ’ 


and its 
(27) 


where we calculated dh/ds in (25). Taking the expected value, and noting (5), we get 


I{s) 


n{n — l)q 
2 


G"{s)Gis) - G'{sf 

G(^)^ 


(28) 


The net effect of this result is that an estimator that achieves the CR bound would have 
variance 0(l/n^). 

However, that assumes that growth of the number of edges is 0{n?), but a more 
reasonable model might increase the size of the network while keeping the average node 
degree k constant, in which case (4) implies that the variance will be 0(l/re). Examples 
of this can be seen in the results. 

Even if the network is not sparse, if we randomly sample 0(n) edges, then the re¬ 
sult above implied that the variance will be 0(l/n), with the benefit that (for some 
estimators) the cost of computation will now be 0{n). 

The mean squared error of an estimator is composed of the variance plus the bias- 
squared, so it is possible for a biased estimator to improve on the bound (as we shall see 
in special cases below), but in general we measure estimator efficiency (with respect to 
accuracy as a function of the amount of data) with respect to this bound. 


4.7. MLEofq 

As noted above, once we know s, we can use this to estimate q using (17). If we know 
the exact value of s then q acts as a random filter of links, so estimation of q is equivalent 
to estimating the parameter of a Bernoulli process. Thus (17) would be the MLE of q 
given the true value of s. The question is whether it is still a MLE when we derive it 
from the estimated value of s. 

To answer this question we draw on the property of functional invariance of MLE 
i.e., if parameters are related through a transformation a = g{s), then the MLE of a is 
d = g{s). This property allows the conversion back to the original Waxman parameter 
a = IjsL, for instance. More importantly it means that q = 2e/n(n — l)G(.s) is the 
MLE of q, given a MLE of s. 

Once again, the property above applies for the ideal MLE. We test the estimator q, 
and we show that its properties are close to those of the ideal MLE, except when s is 
small. 









14 Roughan eial. 

4.8. Numerical calculation of the MLE 

MLEs are often computed directly using techniques such as Newton’s method. In this 
problem, each iteration would require computation of a Laplace transform||, and its 
derivative, and so will be somewhat slower than many MLEs. However, we can improve 
this performance in several ways. First, we rearrange (24) in the form 

dG{s) + G'{s) = Cr[{d-t)g{t)-s] = 0, (29) 

which reduces the number of transforms that need to be calculated. 

The estimation algorithm is a ID search using Matlab’s fzero function. No doubt 
faster implementations are possible using Newton-Raphson or other approaches, but we 
found this to converge in only a few steps, given reasonable starting bounds, which we 
choose to be [0,k/d] (where the upper bound is given by the asymptotic form of the 
average distance (12)). 

In the particular case that d > dmax, be., the sample mean of the edge distances is 
greater than the largest expected edge distance, then we take s = 0. 

The speed of calculation can be improved if we precalculate g{t) at a set of fixed grid 
points, and reuse this in all of the Laplace transform calculations, thus avoiding frequent 
recomputation of the density. We create a second estimate, the MLE-N (Numerical), in 
which we calculate the inverse CDF at uniformly spaced probabilities to minimise errors 
arising from the fixed grid. In the test below we use a grid of 1000 points. We will see 
that this approach allows faster computation at the cost of a small increase in error for 
large s. The tradeoff could be adjusted by adapting the number of points used. 

The other reason for using points spaced uniformly in the probability space is that 
it is relatively easy to estimate an inverse CDF from data. In particular we could, if 
available, take the complete set of distances between all nodes and derive from this an 
empirical estimate of the inverse CDF to use in the MLE calculations. We refer to this 
method as the MLE-E (Empirical). This method avoids the problem of estimating the 
region size and shape, and the assumption that nodes are uniformly distributed over the 
space. 

5. Tests 

In this section we examine the performance of the Waxman graph parameter estimators 
on simulated data. There are three key parameters for a Waxman graph, s, q and n. 
Instead of using these we choose s, n, and the average node degree k which implicitly 
determine q. We primarily focus on fairly sparse networks {k = 3) because sparsity is the 
common case for many real networks, but we do examine the effect of changes in these 
parameters (note there are some combinations of these parameters for which q cannot 
exist). 

The accuracy of the methods increase as the graphs become larger so we choose 
parameter values that result in moderately sized (1000 node) graphs. This ensures that 
errors are of a magnitude that allows us to compare and assess them. However, we do 
also examine the relationship between graph size and accuracy of parameter estimates. 

IIThe Laplace transforms are calculated numerically using Matlab’s integral function, which 
performs adaptive numerical quadrature. 
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Except where otherwise noted we simulate 1000 Waxman random graphs for each 
parameter setting using Matlab, and form estimates of bias and RMS (Root-Mean- 
Square) error for each estimation method. All unannotated results are for the Waxman 
graph on the unit square with the Euclidean distance metric. 


5.1. Comparisons 

We start with a basic test of the estimation methods. We consider the RMS (Root Mean 
Squared) errors of the methods with respect to the CR lower bound as a function of the 
parameter s. Figure 1 shows these errors. The most obvious conclusions are that the 
log-linear regression is the least accurate and that the GLM and MLE estimators are all 
roughly equivalent in accuracy. The minor empirical variants of the log-linear and MLE 
estimators apparently have only small affects on accuracy. 

However, when we examine a more detailed view in Figure 2 some differences are 
apparent. Over the entire range the GLM tracks the CR bound, but for small values of 
s there is some advantage in using the GLM and MLE-E. For very small s, the MLE 
approaches can actually beat the CR bound. This can be explained by the constraint 
s > 0. This constraint leads to a small bias in the MLE estimators for small s. When s 
is around 1, the bias leads to an increase in the RMS errors. When s is very small the 
estimators have slightly more information than assumed in our naive CR bound, and 
this can reduce the variance of the estimator (Gorman and Hero, 1990). 

For large values of s the MLE-E and MLE-N show a small increase in error compared 
to the exact MLE, because for large s the grid size chosen is not small enough for 
accurate integrals to be calculated. This effect could be mitigated by choosing a hner 
grid or by choosing a non-uniform grid with more detail around t = 0 (albeit at additional 
computational cost). We will leave the adaption of grid size to future work. 

For small to moderate values of s, the MLE-E estimator is slightly better than its 
exact cousin. This is valuable, but remember that the MLE-E requires the information 
on all distances, not just those of the existing edges, whereas the MLE and MLE-N 
estimators need only the measurement of d. In fact for large graphs the d used the 
exact MLE could be calculated from a sample of edges, so it is possible to achieve sub- 
0(e) performance for such graphs. We have examined a number of parameter settings 
and have observed the same trends in the results and below we examine accuracy as a 
function of other parameter values. 

The major source of error can be see in Figure 3, which shows the bias in the various 
approaches. We see immediately that the log-linear approach suffers from significant 
bias which increases with s and remains the major source of error in the method. The 
GLM is almost unbiased except for very small networks, where the technique breaks 
down completely, resulting in very large errors. On the other hand, we see that the 
MLE is almost unbiased even for small networks, except for small s where there is a 
small positive bias that explains the increase in RMS errors in that domain. 

We also estimate computation times using Matlab’s tic/toc timers to estimate wall- 
clock time of execution, and take the median over our 1000 samples to provide a robust 
estimate of the typical computation times. Compute times are shown in Figure 4, where 
we can see that the GLM takes quadratic time. The log-linear-E method (not shown) 
is also quadratic, but with a much small constant than the GLM, because a histogram 


RMS error in s RMS error in s 


Roughan et ai. 
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GLM 



log-linear-E 

log-linear 



MLE-E 
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RMS errors as a function of s (fc = 3, n = 1000). See Figure 2 for detaii. 



Fig. 2. RMS errors as a function of s (fc = 3, n = 1000) detaii. 
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■MLE 
■GLM 
■log-linear-E 
■log-linear 
■MLE-E 
-MLE-N 


Fig. 3. Bias in s for the different methods {k = 3,n = 1000). 


must be formed from all of the distances. 

The other methods appear to have constant time with respect to the network size n. 
Constant time is an illusion however: these methods are actually 0(e), which for the 
networks considered is also 0{n). The constant time component is obviously dominant 
for the network sizes considered - we would expect to see the linearity only if we examine 
very large networks. 

Memory requirement for the algorithms are 

• GLM: 0(n2); 

• log-linear (and -E variant): 0{h) where b is the nnmber of histogram bins; 

• MLE: 0(1) (assuming the nnmerical quadrature is memory efficient); and 

• MLE-E and MLE-N: 0(c) where c is the number of grid points. 

Note that all but the GLM use constant memory as a function of network size, as they 
are based on summary statistics that can be computed via a streaming algorithm that 
does not need to hold data in memory. 

There is little doubt that if accuracy is the prime consideration then the GLM is 
the best approach, whereas if there is a need for speed we can trade off a small amonnt 
of accuracy, and use the MLE or MLE-E. However, the practical considerations mean 
the GLM is not viable for large networks. Extrapolating the computation times for the 
GLM shown in Eigure 4, we can see that computation would take in the order of 100 
seconds for a graph of 10,000 nodes (and 15,000 links), or around 3 honrs for a graph 
with 100,000 nodes. Similarly the GLM requires a large amount of memory for larger 
networks, as compared with the almost trivial amount required in the other algorithms. 
On the other hand, the accuracy of the log-linear approaches is poor. Gonseqnently we 
shall consider only the MLE-based estimates in detail in what follows. 
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n (number of nodes) 


Fig. 4. Computation times {s = 4, k = 3). Note that oniy the estimator time is inciuded, not the 
time to form the matrix of distances needed in the GLM and iog-iinear methods. 


5.2. MLE in detail 

We now consider the accuracy of the MLE as both a function of the network size n 
in Figure 5(a) and of the average node degree k in Figure 5(b). The accuracy of the 
method is close to the CR bound for all parameter ranges, but suffers slightly for small 
networks n, and for higher average node degree. 

All of the previous results consider accuracy of parameter estimates of the Waxman 
graph over the unit square. Our approach naturally extends to a variety of scales, 
regions, and distance metrics. Figure 6(a) shows the CR bounds for estimator accuracy 
as a function of the length of the side of the square region, and Figure 6(b) shows these 
bounds for different region shapes. 

In the first case estimator accuracy is worse in smaller regions. In such regions, all 
points are compacted closer together, and there is a smaller range of possible length 
scales. Thus we must estimate the exponential decay factor over a smaller range of 
scales. The effect is more dramatic for smaller s values, where longer links are still 
likely. As s —)• oo the CR bounds converge because the length scale of typical links drops 
well below the lengths of typical (potential) links. 

In the second case there are variations between regions but most have the same rough 
shape as a function of s. Notable exceptions are sphere and hyper-sphere which are not 
monotonically increasing functions of s as the others are. 

It is also notable that all of the methods have the same asymptotic slope for large s 
(though the function for the hyper-sphere is truncated because larger s is invalid given 
the fixed values of n and k used in this plot). The overall conclusion is that the shape of 
the region does have an effect (up to almost a factor of 2 in estimator accuracy at some 
parameter values). However, the the shape of the region is not the most important factor 
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(a) (b) 


Fig. 5. Accuracy of the MLE as a function of other parameters of the network (default parame¬ 
ters n = 1000, s = 3.0, and k = ?,) 



(a) Square region of size L. (b) Different region shapes. 

Fig. 6. CR bounds for different regions. 


in determining the accuracy of the estimator: the size of the graph, and its parameters 
play a larger role. 


5.3. Robustness 

More importantly, it is possible that we will not know the exact shape of the region of 
interest, or that the shape used in estimation is an approximation of an irregular region. 
In these cases we could use the empirical estimates of the potential links, but if the shape 
of the region is unknown, it is important to consider the impact of an erroneous decision. 
Figure 7 shows that impact by highlighting the relative size of the induced error as a 
result of assuming the wrong shaped region for three different region shapes (a square, 
a disk and a rectangle with an aspect ratio of 2:1). 

Figure 7(a) shows that there is a certain robustness, as choosing the disk instead of the 
square or visa versa leads to only a small increase in error. On the other hand, assuming 
a square or disk when the region is really a rectangle causes much larger errors. The 
difference is that the longer thinner rectangle has a considerably larger boundary to area 
proportion, and the boundary effects play a larger role in the random line distribution. 

Comparison of Figure 7(a) and (b) suggests that the effect is more significant for 
smaller values of s. This is because larger values of s result in shorter links on average 
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(a) s = 3.0. 


(b) s = 20.0. 


Fig. 7. Relative RMS error when the assumed region shape is incorrect. Note that errors are 
much smaller for large s values, because boundary affects play a lesser role. 


hence boundary effects play a smaller role. So unsurprisingly, the estimates are more 
robust for large s. 

We also consider the impact of estimating the size of the region. Figure 8 shows RMS 
errors for a disk-shaped region when the diameter is known (Real), when it is estimated 
by taking the largest link-distance (Est 1), and when it is estimated using the largest 
inter-node distance (Est 2). 

The loss in estimator accuracy when the size of the region is unknown is clear. Fur¬ 
ther, the inaccuracy increases with s because as s increase, the longest observed link 
is less likely to be a proxy for the actual size of the region. We could correct for this 
bias, but instead we compare to estimates of the region size formed from the maximum 
inter-node distance, and we see that there is almost no loss of accuracy in the estimator. 

The conclusion is that some knowledge of the region over which the graph is drawn 
is important. Very poor estimates of the region can result in much worse estimator 
accuracy, but there is a fair amount of robustness so the estimate of the region shape 
and scale need not be perfect and estimates drawn from the data are sufficient. 

Finally, to test how robust the model outputs are when the underlying model is 
incorrect, i.e., the distance deterrence function is not Waxman, we test the performance 
of the MLE when the underlying graph is actually the DASTB model (Davis et ah, 2014) 
used for the vole contact graph data. We use this model as a comparison, because it 
is very similar in concept, but has different details. Figure 9 shows that there is some 
very small induced bias in the estimates as a result of applying the model in the wrong 
situation, but these would have almost no influence on the conclusions draw from s 
estimates. 


5.4. Estimating q 

In most of the preceding work we have only considered the accuracy of the estimates 
for s. The estimated accuracy of q is that of the binomial model parameter estimate 
which is well understood except for the manner in which errors in s propagate into the 
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Fig. 8. The impact of estimating the size of the region. Reai is the MLE given the reai region 
size. Est 1 is the MLE given the region size is estimated using the iongest iink, and Est 2 is 
the MLE given the region size is estimated from the iongest inter-node distance. We see that 
estimating the region size using the inter-node distances introduced minimai errors, but using 
the iink distances increases the errors, except for very smaii s. 



Fig. 9. Estimates s when the underiying network is not Waxman, but rather drawn from the 
DASTB voie-contact modei. For smaiier s, using the wrong modei introduces minimai addition 
error. 
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Fig. 10. Relative error for the MLE q {n ^ 1000, k = 3). Squares indicate the RMS error 
relative to q assuming we were given the true s values, and circles indicate the error of the 
actual estimator. 


estimate q through G{s). 

Figure 10 shows the RMS errors in q relative to the size of q, given the estimated and 
true values of s. We can see from the figure that the errors in q are roughly doubled by 
the uncertainty in s, so this error does have a significant effect. However, the errors in 
q are still small: i.e., of the order of 4%. 

6. Case studies 

The previous section looked at accuracy on simulated data, where we know the ground 
truth. This section considers how well the method works on real data. We consider 
three datasets, one biological and two Internet. 


6.1. Vole data 

Davis et al. (2014) collected a large mark-recapture dataset on the English-Scottish 
border over a 7-year period to study relationships between field voles (Microtus agrestis). 
In particular, they considered a number of models for the contact graphs created by 
relationships between the voles. A grid of trap locations was created on each of four 
sites and over 64 capture periods traps were emptied multiple times. Contact graphs 
were built by associating voles that were caught in the same trap on separate occasions. 
In their main dataset, the time periods were divided into pairs and each pair used to 
construct one graph, so in total 32 graphs were created for each site. One might argue 
with the methodological accuracy of constructing contact graphs from trap data in this 
manner, but there is no doubt that this is an unusually large dataset of related graphs. It 
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(a) Reproduction of Figure 5(a) from 
(Davis et al., 2014). 


BHP 

• 


• 

• • 

• 


• • • 

• 

• • * ••* 

. • 

/%• . # 

* • • 

r=0.7002 




0 50 100 150 200 

popuiation density estimate 


(b) Equivalent plot of s estimates against 
population density. 


Fig. 11. Vole data: data drawn from the Black Blake Hope (BHP) site: r reports the correlation 
coefficients. 


is rare to have more than a few example contact graphs in any one application. The large 
number of these graphs allows some views of estimates that are not usually available. 

Davis et al. (2014) fitted multiple models to the data including a GER random graph, 
and their DASTB model, which resembles the Waxman graph. Their model incorporated 
the capture process as an explicit component and they form an edge in the graph when 
one or more contacts occurred. 

Using the original raw data provided by the authors we reconstructed their contact 
graphs and confirmed we could reproduce the results obtained in (Davis et ah, 2014). 
Estimates of A (their distance parameter) were recomputed using the complimentary log- 
log regression to confirm that the recreated graphs and their measured statistics were 
correct, and that the units were correct. Figure 11(a) reproduces Davis et al. (2014, 
Figure 5(a)), demonstrating that the data has been correctly reconstructed. 

Next we performed our own estimates using the MLE with the underlying region 
being a square of side 10 (comprised of 10 traps with a 5m spread between each, thus 
matching the experiment. Figure 11(b) shows our estimates (we have repeated the 
same procedure for all four original datasets with similar results). We can see from 
the graph that although estimates of s are not the same as A, the general conclusion 
of (Davis et ah, 2014) that there is a correlation between population density and the 
spatial contact parameter is supported with almost the same correlation coefficient being 
reported for both datasets. 

The results emphasise the original findings of Davis et al., notably that: 

• Animal contact graphs are well modelled by spatial graphs (in our case there is a 
small difference in the particular model, but the general behaviour is similar). 

• This modelling provides a means of quantifying differences between two groups of 
animals (separated in time or space). 

• The parameter s tells us something important that other many models omit, namely 
that the distance voles travel when contacting each other decreases with population 
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Table 1. Summary of Mercator datasets: 
number of nodes, edges, and the average link 
distance. 


Region 

n 

e 

d (miles) 

USA 

123426 

152602 

384.7 

EUROPE 

32928 

30049 

319.5 

JAPAN 

14318 

16665 

317.6 


density. Thus has a profound importance for disease transmission (Davis et ah, 
2014) in that population mixing may not increase with population density as quickly 
as is often assumed. 

More generally, models can be used for the usual suite of purposes such as simulation and 
extrapolation. For example, once we know the length scale at which contact is unlikely, 
we might use this in the design of capture/recapture experiments by choosing the scale 
and resolution of the capture setup. 


6.2. Internet dataset 1 

Lakhina et al. (2002) undertook one of the first attempts to formally quantify the ex¬ 
ponential decrease of link likelihood as a function of distance. The authors compared 
two sets of data and found consistent results between them. They provided one of these 
datasets (the Mercator data) to us for comparison. Lakhina et al. separated the data 
into three regions, and analysed these separately. Table 1 provides a brief summary of 
the three regions. 

We do not argue that this network is random in any real sense: in fact the Internet 
networks are the result of design. However, fitting a Waxman-like graph to these is 
instructive in that it shows how engineering constraints lead to distance-sensitive link 
placement. 

We can see first from the data that the graphs are large but very sparse, with av¬ 
erage node degrees of around 2. In this dataset we also have node locations so we can 
derive distances between all pairs of nodes and thus apply all of the possible techniques 
considered here. However, it is not feasible to use the GLM approach for this scale of 
problem. 

Lakhina et al. (2002) applied log-linear regression to the question. We applied the 
MLE and compared the results to those found in their work. Table 2 provides a compar¬ 
ison between various estimates, including the original values reported in Lakhina et al. 
(2002) given in the second column under sLakhina- Units are per 1000 miles — we use 
Imperial units to be consistent with the original paper. The third column provides our 
equivalent estimate. There are small differences, presumably because of differences in 
the exact numerical procedures applied. 

The fourth column of the table shows the MLE-E values estimated for the datasets. 
We use the Empirical estimator because the region shapes are irregular {e.g., the USA), 
and we want to avoid approximation errors arising from the region shape. 

We see considerable discrepancies which are larger than can be explained by errors 
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Table 2. Estimates: SLakhina are the values from Lakhina 
et al. (2002); and hog-iinear are our corresponding estimates. 

Units are per 1000 mites. The smle-e values are derived 
from our Empirical MLE, and VnesMLE-T values from a ver¬ 
sion of the MLE with distance data truncated in the same 
manner as the original log-linear estimates. 


Region 

^Lakhina 

^log —linear 

Smle-e 

Smle-t 

USA 

6.91 

6.38 

2.75 

6.63 

EUROPE 

12.80 

12.81 

30.92 

10.09 

JAPAN 

6.89 

6.71 

45.91 

7.30 


in the log-linear regression approach. However, reading Lakhina et al. (2002), we can 
see that their estimates are over a truncated range of distances for two reasons: 

• The node locations they use are artificially quantised by the Geolocation procedure 
so some nodes appear to have exactly the same position, and hence zero distance, 
when actually there is a positive distance between the nodes; and 

• They found that the exponential distance-deterrence function fit the data only up 
to some threshold distance. 

In their (and our comparison) log-linear regressions the range over which we perform 
the regression is restricted to be between these bounds. 

In order to provide a fair comparison we also modified the MLE-E by censoring the 
potential edges used in forming the CDE and in computing the average edges distance. 
Table 2 shows the results under smle-t to be closer to being consistent with the log- 
linear regression. 

The results point to one valuable feature of the log-linear regression, which is that it 
comes with diagnostics. Examination of the fit indicates whether the model is appropri¬ 
ate or not. The MLE requires additional effort to provide similar diagnostics. On the 
other hand, there are significant issues with the log-linear regression. Apart from being 
less accurate, there is the question of bin size which simple experiments seem to suggest 
has more effect on estimates than one might hope. 

Ultimately, all of the methods suggest strongly that a spatial component should be 
part of any model for Internet linkages. This is entirely consistent with the intuition of 
engineers who work on such networks: long links cost more, and so are rarer. 


6.3. Internet dataset 2 

Einally we apply the MLE to a set of real networks from the Internet Topology Zoo (Knight 
et ah, 2011). The networks - taken at Point of Presence (PoP) level - show the con¬ 
nectivity of a sample of seven major network operators in different regions of the world. 
Pigure 12 shows one example, and the seven are summarised and results shown in Ta¬ 
ble 3. 

These graphs are much smaller as a result of being drawn at the PoP level, and so 
errors in the estimates will be correspondingly larger. However, there is a very clear 
variation in the parameters. 
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Uunet 



Fig. 12 . The Uunet network from the Internet Topology Zoo. 
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Table 3. Estimates for Internet Topology Zoo data. Units for s 
are per 1000 km. 


Network 

Region 

n 

e 

d 

smle-e 

Aarnet 

Australia 

19 

24 

695.6 

2.20 

linet 

Australia 

9 

12 

1472.9 

0.30 

BtEurope 

Europe 

22 

35 

606.3 

2.05 

Colt 

Europe 

153 

177 

160.2 

9.48 

Abilene 

USA 

11 

14 

1007.0 

1.39 

Internetmci 

USA 

19 

33 

927.7 

1.25 

Uunet 

USA 

42 

77 

966.9 

1.09 


Firstly, although the units for s here are per 1000 km, we can clearly see that the 
s estimates are a good deal smaller than for the previous dataset, largely because of 
looking at a single network at a time, at the PoP level. Router-level networks (such as 
the previous dataset) include numerous routers, many of them connected, in the same 
cities. The interconnects between networks also tend to be shorter as they are mediated 
through Internet exchange points or carrier hotels. However, at the PoP level, all of these 
small-scale details are invisible, and consequently we see less strong distance deterrence. 

Secondly, we see a wide variety of s values, with little relationship to region. Networks 
are just different. They are built with different goals, and different cost structures and 
constraints. Some are new, and others are older and incorporate legacy components. 

The wide variety of network structures has been observed in the Zoo data before 
(Bowden et ah, 2014), but not with respect to spatial structure. This reinforces the 
message that estimates of model parameters such as s provide alternative methods to 
compare network behaviour. 

7. Discussion and Conciusion 

This paper presents the MLE for the parameters of the Waxman graph and demonstrates 
its accuracy in comparison to alternative estimators. The MLE has two advantages. 
Firstly it can guarantee 0{n) computational time complexity and constant memory 
usage by using only a sample of the edges that exist in a graph to estimate s. Secondly 
it can be applied in domains where the coordinates of nodes are unknown and/or edge 
lengths may be weights in some arbitrary process. Using the MLE we have shown that 
real networks have a considerably wider range of s parameters than is typically used in 
the literature. 

The next question might be: “Is the Waxman model a better model than X?” where 
X might be the GER random graph, or some other model. Davis et al. (2014) considered 
this question using AIC, but for the simple question of testing whether there should be 
a distance term or not, it is perhaps more natural to use a hypothesis test. We applied 
the standard likelihood ratio test and found that standard threshold choices resulted in 
poor ability to specify the significance. It seems the problem arises from the failure of 
the independence assumption. If so the problem is correctable, but we leave the question 
of correctly determining the cutoff threshold for future work. 
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